Overview

Dataset statistics

Number of variables31
Number of observations168120
Missing cells1777427
Missing cells (%)34.1%
Duplicate rows541
Duplicate rows (%)0.3%
Total size in memory128.8 MiB
Average record size in memory803.1 B

Variable types

CAT16
NUM9
UNSUPPORTED4
BOOL2

Reproduction

Analysis started2020-04-13 04:10:44.948190
Analysis finished2020-04-13 04:12:52.521783
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 541 (0.3%) duplicate rows Duplicates
MUNIC_RES has a high cardinality: 566 distinct values High cardinality
NASC has a high cardinality: 32641 distinct values High cardinality
DT_INTER has a high cardinality: 4487 distinct values High cardinality
DIAG_PRINC has a high cardinality: 908 distinct values High cardinality
DIAG_SECUN has a high cardinality: 697 distinct values High cardinality
MUNIC_MOV has a high cardinality: 146 distinct values High cardinality
DIAGSEC1 has a high cardinality: 600 distinct values High cardinality
DIAGSEC2 has a high cardinality: 112 distinct values High cardinality
HOSP has a high cardinality: 173 distinct values High cardinality
BAIRRO_RES has a high cardinality: 571 distinct values High cardinality
VAL_SH is highly correlated with UTI_MES_TO and 1 other fieldsHigh Correlation
UTI_MES_TO is highly correlated with VAL_SHHigh Correlation
VAL_SP is highly correlated with VAL_SHHigh Correlation
DIAG_SECUN has 84081 (50.0%) missing values Missing
ETNIA has 36659 (21.8%) missing values Missing
DIAGSEC1 has 161680 (96.2%) missing values Missing
DIAGSEC2 has 167827 (99.8%) missing values Missing
DIAGSEC3 has 168052 (> 99.9%) missing values Missing
DIAGSEC4 has 168113 (> 99.9%) missing values Missing
DIAGSEC5 has 168119 (> 99.9%) missing values Missing
DIAGSEC6 has 168120 (100.0%) missing values Missing
DIAGSEC7 has 168120 (100.0%) missing values Missing
DIAGSEC8 has 168120 (100.0%) missing values Missing
DIAGSEC9 has 168120 (100.0%) missing values Missing
HOSP has 92037 (54.7%) missing values Missing
BAIRRO_RES has 58379 (34.7%) missing values Missing
UTI_INT_TO is highly skewed (γ1 = 69.38404365) Skewed
NACIONAL is highly skewed (γ1 = 72.1097067) Skewed
NASC only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
DT_INTER only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
DIAGSEC6 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC7 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC8 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC9 is an unsupported type, check if it needs cleaning or further analysis Rejected
UTI_MES_TO has 143218 (85.2%) zeros Zeros
UTI_INT_TO has 167961 (99.9%) zeros Zeros
DIAS_PERM has 2756 (1.6%) zeros Zeros

Variables

CEP
Real number (ℝ≥0)

Distinct count18900
Unique (%)11.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42162280.73229241
Minimum1001000
Maximum98880000
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum1001000
5-th percentile40230320
Q140715315
median41310220
Q342850000
95-th percentile47800114
Maximum98880000
Range97879000
Interquartile range (IQR)2134685

Descriptive statistics

Standard deviation2371603.988
Coefficient of variation (CV)0.05624942358
Kurtosis49.80638208
Mean42162280.73
Median Absolute Deviation (MAD)1702109.874
Skewness1.499763734
Sum7.088322637e+12
Variance5.624505477e+12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1001000. 7466287.5 39805495. 40005495. 40010015. ... 56316195. 56317403. 56481602. 59262485. 98880000. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
43700000 7373 4.4%
 
42700000 4497 2.7%
 
44470000 3175 1.9%
 
44460000 2987 1.8%
 
41250000 2929 1.7%
 
40415000 2464 1.5%
 
42802580 2440 1.5%
 
40050410 2002 1.2%
 
42850000 1947 1.2%
 
42820000 1783 1.1%
 
Other values (18890) 136523 81.2%
 
ValueCountFrequency (%) 
1001000 1 < 0.1%
 
1310935 1 < 0.1%
 
1509970 1 < 0.1%
 
2325529 1 < 0.1%
 
3242020 1 < 0.1%
 
ValueCountFrequency (%) 
98880000 1 < 0.1%
 
96880970 1 < 0.1%
 
96880000 2 < 0.1%
 
96504182 1 < 0.1%
 
94960100 1 < 0.1%
 

MUNIC_RES
Categorical

HIGH CARDINALITY
Distinct count566
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
Salvador
110557
Simões Filho
 
7351
Camaçari
 
6657
Lauro de Freitas
 
4929
Vera Cruz
 
3182
Other values (561)
35444
ValueCountFrequency (%) 
Salvador 110557 65.8%
 
Simões Filho 7351 4.4%
 
Camaçari 6657 4.0%
 
Lauro de Freitas 4929 2.9%
 
Vera Cruz 3182 1.9%
 
Itaparica 2976 1.8%
 
Candeias 2738 1.6%
 
Dias d'Ávila 1955 1.2%
 
Mata de São João 1766 1.1%
 
São Sebastião do Passé 1406 0.8%
 
Other values (556) 24603 14.6%
 

Length

Max length22
Mean length8.5109862
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 23 50.0%
 
Uppercase_Letter 11 23.9%
 
Decimal_Number 10 21.7%
 
Space_Separator 1 2.2%
 
Other_Punctuation 1 2.2%
 
ValueCountFrequency (%) 
Latin 34 73.9%
 
Common 12 26.1%
 
ValueCountFrequency (%) 
ASCII 41 100.0%
 

NASC
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count32641
Unique (%)19.4%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
1993-04-16
 
125
1936-02-10
 
104
1920-10-12
 
97
1980-02-02
 
92
1963-05-01
 
66
Other values (32636)
167636
ValueCountFrequency (%) 
1993-04-16 125 0.1%
 
1936-02-10 104 0.1%
 
1920-10-12 97 0.1%
 
1980-02-02 92 0.1%
 
1963-05-01 66 < 0.1%
 
2011-12-07 56 < 0.1%
 
1980-07-11 52 < 0.1%
 
2008-01-18 51 < 0.1%
 
2006-05-28 50 < 0.1%
 
2008-01-24 48 < 0.1%
 
Other values (32631) 167379 99.6%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Dash_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

SEXO
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
1
90827
3
77293
ValueCountFrequency (%) 
1 90827 54.0%
 
3 77293 46.0%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

UTI_MES_TO
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count85
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.27984177968118
Minimum0
Maximum99
Zeros143218
Zeros (%)85.2%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile8
Maximum99
Range99
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.642449423
Coefficient of variation (CV)3.627361988
Kurtosis51.45973925
Mean1.27984178
Median Absolute Deviation (MAD)2.189940202
Skewness5.990889602
Sum215167
Variance21.55233664
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 2.5 3.5 4.5 ... 38.5 45.5 60.5 71.5 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 143218 85.2%
 
2 2993 1.8%
 
1 2823 1.7%
 
3 2500 1.5%
 
7 2026 1.2%
 
4 2022 1.2%
 
5 1700 1.0%
 
6 1578 0.9%
 
8 1059 0.6%
 
9 822 0.5%
 
Other values (75) 7379 4.4%
 
ValueCountFrequency (%) 
0 143218 85.2%
 
1 2823 1.7%
 
2 2993 1.8%
 
3 2500 1.5%
 
4 2022 1.2%
 
ValueCountFrequency (%) 
99 1 < 0.1%
 
96 2 < 0.1%
 
93 1 < 0.1%
 
92 3 < 0.1%
 
91 1 < 0.1%
 

UTI_INT_TO
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count37
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.010022602902688556
Minimum0
Maximum66
Zeros167961
Zeros (%)99.9%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum66
Range66
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.453869594
Coefficient of variation (CV)45.28460305
Kurtosis6362.349065
Mean0.0100226029
Median Absolute Deviation (MAD)0.02002624799
Skewness69.38404365
Sum1685
Variance0.2059976083
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 6.5 15.5 31.5 66. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 167961 99.9%
 
3 15 < 0.1%
 
6 15 < 0.1%
 
2 14 < 0.1%
 
1 11 < 0.1%
 
4 11 < 0.1%
 
5 10 < 0.1%
 
9 10 < 0.1%
 
10 8 < 0.1%
 
7 6 < 0.1%
 
Other values (27) 59 < 0.1%
 
ValueCountFrequency (%) 
0 167961 99.9%
 
1 11 < 0.1%
 
2 14 < 0.1%
 
3 15 < 0.1%
 
4 11 < 0.1%
 
ValueCountFrequency (%) 
66 1 < 0.1%
 
54 1 < 0.1%
 
42 1 < 0.1%
 
37 1 < 0.1%
 
36 1 < 0.1%
 

VAL_SH
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count51772
Unique (%)30.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1259.4719468831788
Minimum19.03
Maximum54514.88
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum19.03
5-th percentile160.62
Q1467.47
median544.07
Q3896.4125
95-th percentile4986.065
Maximum54514.88
Range54495.85
Interquartile range (IQR)428.9425

Descriptive statistics

Standard deviation2438.11638
Coefficient of variation (CV)1.935824284
Kurtosis55.2629372
Mean1259.471947
Median Absolute Deviation (MAD)1200.528299
Skewness6.042422989
Sum211742423.7
Variance5944411.484
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.9030000e+01 2.4750000e+01 3.0970000e+01 3.2405000e+01 3.3840000e+01 ... 2.0449205e+04 2.3521360e+04 3.0419135e+04 4.1608715e+04 5.4514880e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
504.07 12829 7.6%
 
512.07 3364 2.0%
 
528.07 3135 1.9%
 
520.07 3046 1.8%
 
453.48 2710 1.6%
 
536.07 2546 1.5%
 
451.47 2097 1.2%
 
480.07 2074 1.2%
 
544.07 1694 1.0%
 
241.31 1557 0.9%
 
Other values (51762) 133068 79.2%
 
ValueCountFrequency (%) 
19.03 5 < 0.1%
 
30.47 657 0.4%
 
31.47 38 < 0.1%
 
33.34 582 0.3%
 
34.34 16 < 0.1%
 
ValueCountFrequency (%) 
54514.88 1 < 0.1%
 
53127.45 1 < 0.1%
 
51369.08 1 < 0.1%
 
51199.15 1 < 0.1%
 
50812.25 1 < 0.1%
 

VAL_SP
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count11052
Unique (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean207.36150808945993
Minimum5.1
Maximum15383.27
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum5.1
5-th percentile25.71
Q155.69
median78.35
Q3173.29
95-th percentile852.6215
Maximum15383.27
Range15378.17
Interquartile range (IQR)117.6

Descriptive statistics

Standard deviation401.136689
Coefficient of variation (CV)1.934479994
Kurtosis85.73160157
Mean207.3615081
Median Absolute Deviation (MAD)206.6830184
Skewness6.527403945
Sum34861616.74
Variance160910.6432
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[5.100000e+00 5.210000e+00 5.330000e+00 5.585000e+00 6.405000e+00 ... 3.271180e+03 4.191710e+03 4.960975e+03 6.932780e+03 1.538327e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
78.35 47238 28.1%
 
25.71 8637 5.1%
 
26.51 7429 4.4%
 
29.4 5711 3.4%
 
183.91 5573 3.3%
 
74.62 4484 2.7%
 
117.52 4126 2.5%
 
24.1 1857 1.1%
 
44.1 1508 0.9%
 
9.91 1328 0.8%
 
Other values (11042) 80229 47.7%
 
ValueCountFrequency (%) 
5.1 5 < 0.1%
 
5.32 60 < 0.1%
 
5.34 1 < 0.1%
 
5.58 4 < 0.1%
 
5.59 224 0.1%
 
ValueCountFrequency (%) 
15383.27 1 < 0.1%
 
14247.6 1 < 0.1%
 
12311.11 1 < 0.1%
 
12148.16 1 < 0.1%
 
11945.28 1 < 0.1%
 

DT_INTER
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count4487
Unique (%)2.7%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
2008-01-01
 
215
2011-12-28
 
137
2012-12-01
 
126
2017-07-01
 
124
2017-05-01
 
119
Other values (4482)
167399
ValueCountFrequency (%) 
2008-01-01 215 0.1%
 
2011-12-28 137 0.1%
 
2012-12-01 126 0.1%
 
2017-07-01 124 0.1%
 
2017-05-01 119 0.1%
 
2017-06-01 118 0.1%
 
2019-06-01 116 0.1%
 
2014-08-01 115 0.1%
 
2016-09-01 112 0.1%
 
2015-05-01 107 0.1%
 
Other values (4477) 166831 99.2%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Dash_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

DIAG_PRINC
Categorical

HIGH CARDINALITY
Distinct count908
Unique (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
J189
23922
J159
18419
J960
18136
J180
 
13980
J188
 
10166
Other values (903)
83497
ValueCountFrequency (%) 
J189 23922 14.2%
 
J159 18419 11.0%
 
J960 18136 10.8%
 
J180 13980 8.3%
 
J188 10166 6.0%
 
J459 7228 4.3%
 
J158 6832 4.1%
 
J353 5880 3.5%
 
J219 5413 3.2%
 
J449 3318 2.0%
 
Other values (898) 54826 32.6%
 

Length

Max length4
Mean length3.928414228
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 21 67.7%
 
Decimal_Number 10 32.3%
 
ValueCountFrequency (%) 
Latin 21 67.7%
 
Common 10 32.3%
 
ValueCountFrequency (%) 
ASCII 31 100.0%
 

DIAG_SECUN
Categorical

HIGH CARDINALITY
MISSING
Distinct count697
Unique (%)0.8%
Missing84081
Missing (%)50.0%
Memory size1.3 MiB
0
70120
Y099
 
1599
J960
 
1276
R060
 
1254
J90
 
1059
Other values (692)
 
8731
ValueCountFrequency (%) 
0 70120 41.7%
 
Y099 1599 1.0%
 
J960 1276 0.8%
 
R060 1254 0.7%
 
J90 1059 0.6%
 
A150 831 0.5%
 
J189 606 0.4%
 
J188 601 0.4%
 
Y86 449 0.3%
 
J450 448 0.3%
 
Other values (687) 5796 3.4%
 
(Missing) 84081 50.0%
 

Length

Max length4
Mean length2.235302165
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 25 67.6%
 
Decimal_Number 10 27.0%
 
Lowercase_Letter 2 5.4%
 
ValueCountFrequency (%) 
Latin 27 73.0%
 
Common 10 27.0%
 
ValueCountFrequency (%) 
ASCII 37 100.0%
 

MUNIC_MOV
Categorical

HIGH CARDINALITY
Distinct count146
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
Salvador
136547
Simões Filho
 
6031
Itaparica
 
5705
Camaçari
 
5237
Lauro de Freitas
 
3265
Other values (141)
 
11335
ValueCountFrequency (%) 
Salvador 136547 81.2%
 
Simões Filho 6031 3.6%
 
Itaparica 5705 3.4%
 
Camaçari 5237 3.1%
 
Lauro de Freitas 3265 1.9%
 
Candeias 2975 1.8%
 
292740 1687 1.0%
 
Mata de São João 1471 0.9%
 
Dias d'Ávila 1307 0.8%
 
São Sebastião do Passé 1063 0.6%
 
Other values (136) 2832 1.7%
 

Length

Max length22
Mean length8.534433738
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 23 50.0%
 
Uppercase_Letter 11 23.9%
 
Decimal_Number 10 21.7%
 
Space_Separator 1 2.2%
 
Other_Punctuation 1 2.2%
 
ValueCountFrequency (%) 
Latin 34 73.9%
 
Common 12 26.1%
 
ValueCountFrequency (%) 
ASCII 41 100.0%
 

DIAS_PERM
Real number (ℝ≥0)

ZEROS
Distinct count135
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.085789911967643
Minimum0
Maximum347
Zeros2756
Zeros (%)1.6%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile1
Q13
median5
Q310
95-th percentile31
Maximum347
Range347
Interquartile range (IQR)7

Descriptive statistics

Standard deviation11.79771026
Coefficient of variation (CV)1.298479315
Kurtosis29.31916807
Mean9.085789912
Median Absolute Deviation (MAD)7.507054719
Skewness3.712986324
Sum1527503
Variance139.1859674
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 97.5 99.5 109.5 138.5 347. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 25475 15.2%
 
3 19525 11.6%
 
4 16805 10.0%
 
1 13402 8.0%
 
5 12840 7.6%
 
6 10475 6.2%
 
7 9399 5.6%
 
8 7384 4.4%
 
9 4984 3.0%
 
10 4355 2.6%
 
Other values (125) 43476 25.9%
 
ValueCountFrequency (%) 
0 2756 1.6%
 
1 13402 8.0%
 
2 25475 15.2%
 
3 19525 11.6%
 
4 16805 10.0%
 
ValueCountFrequency (%) 
347 1 < 0.1%
 
338 1 < 0.1%
 
309 1 < 0.1%
 
308 1 < 0.1%
 
304 1 < 0.1%
 

MORTE
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
0
152383
1
 
15737
ValueCountFrequency (%) 
0 152383 90.6%
 
1 15737 9.4%
 

NACIONAL
Real number (ℝ≥0)

SKEWED
Distinct count31
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.078943611705924
Minimum10
Maximum339
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum10
5-th percentile10
Q110
median10
Q310
95-th percentile10
Maximum339
Range329
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.884344283
Coefficient of variation (CV)0.2861752575
Kurtosis6956.403672
Mean10.07894361
Median Absolute Deviation (MAD)0.1576486834
Skewness72.1097067
Sum1694472
Variance8.319441945
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 10. 15. 34.5 39.5 44.5 47.5 109.5 339. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
10 167866 99.8%
 
45 132 0.1%
 
35 16 < 0.1%
 
81 15 < 0.1%
 
71 14 < 0.1%
 
39 12 < 0.1%
 
37 7 < 0.1%
 
100 6 < 0.1%
 
21 5 < 0.1%
 
110 5 < 0.1%
 
Other values (21) 42 < 0.1%
 
ValueCountFrequency (%) 
10 167866 99.8%
 
20 3 < 0.1%
 
21 5 < 0.1%
 
30 2 < 0.1%
 
32 1 < 0.1%
 
ValueCountFrequency (%) 
339 2 < 0.1%
 
333 4 < 0.1%
 
264 1 < 0.1%
 
260 1 < 0.1%
 
210 1 < 0.1%
 

INSTRU
Boolean

CONSTANT
REJECTED
Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
0
168120
ValueCountFrequency (%) 
0 168120 100.0%
 

INSC_PN
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
0
168119
2
 
1
ValueCountFrequency (%) 
0 168119 > 99.9%
 
2 1 < 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

CNES
Real number (ℝ≥0)

Distinct count200
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1748946.4465441352
Minimum3778
Maximum9443665
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum3778
5-th percentile3816
Q14065
median4294
Q32802104
95-th percentile6595197
Maximum9443665
Range9439887
Interquartile range (IQR)2798039

Descriptive statistics

Standard deviation2306781.26
Coefficient of variation (CV)1.318954771
Kurtosis0.8510470548
Mean1748946.447
Median Absolute Deviation (MAD)1905419.7
Skewness1.274228515
Sum2.940328766e+11
Variance5.32123978e+12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[3.7780000e+03 3.7820000e+03 3.7900000e+03 3.8010000e+03 3.8120000e+03 ... 7.1740220e+06 7.2052555e+06 7.2233155e+06 9.4134815e+06 9.4436650e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2802104 23464 14.0%
 
4065 20348 12.1%
 
6595197 13958 8.3%
 
4278 11069 6.6%
 
3980 9980 5.9%
 
3816 9762 5.8%
 
3859 9570 5.7%
 
2532387 6110 3.6%
 
4073 6100 3.6%
 
2602083 5757 3.4%
 
Other values (190) 52002 30.9%
 
ValueCountFrequency (%) 
3778 656 0.4%
 
3786 1565 0.9%
 
3794 147 0.1%
 
3808 1709 1.0%
 
3816 9762 5.8%
 
ValueCountFrequency (%) 
9443665 1767 1.1%
 
9383298 2 < 0.1%
 
7223676 5 < 0.1%
 
7222955 497 0.3%
 
7187556 239 0.1%
 

RACA_COR
Real number (ℝ≥0)

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.22539257673091
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median99
Q399
95-th percentile99
Maximum99
Range98
Interquartile range (IQR)96

Descriptive statistics

Standard deviation44.06951797
Coefficient of variation (CV)0.6275439176
Kurtosis-1.227562598
Mean70.22539258
Median Absolute Deviation (MAD)40.34709319
Skewness-0.8787333434
Sum11806293
Variance1942.122414
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 2.5 3.5 4.5 52. 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
99 117867 70.1%
 
3 38179 22.7%
 
2 7593 4.5%
 
1 3398 2.0%
 
4 1076 0.6%
 
5 7 < 0.1%
 
ValueCountFrequency (%) 
1 3398 2.0%
 
2 7593 4.5%
 
3 38179 22.7%
 
4 1076 0.6%
 
5 7 < 0.1%
 
ValueCountFrequency (%) 
99 117867 70.1%
 
5 7 < 0.1%
 
4 1076 0.6%
 
3 38179 22.7%
 
2 7593 4.5%
 

ETNIA
Categorical

MISSING
Distinct count2
Unique (%)< 0.1%
Missing36659
Missing (%)21.8%
Memory size1.3 MiB
0
131460
7
 
1
ValueCountFrequency (%) 
0 131460 78.2%
 
7 1 < 0.1%
 
(Missing) 36659 21.8%
 

Length

Max length3
Mean length3
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 2 40.0%
 
Decimal_Number 2 40.0%
 
Other_Punctuation 1 20.0%
 
ValueCountFrequency (%) 
Common 3 60.0%
 
Latin 2 40.0%
 
ValueCountFrequency (%) 
ASCII 5 100.0%
 

DIAGSEC1
Categorical

HIGH CARDINALITY
MISSING
Distinct count600
Unique (%)9.3%
Missing161680
Missing (%)96.2%
Memory size1.3 MiB
J960
1117
Y86
1030
A419
 
481
J159
 
378
J969
 
286
Other values (595)
3148
ValueCountFrequency (%) 
J960 1117 0.7%
 
Y86 1030 0.6%
 
A419 481 0.3%
 
J159 378 0.2%
 
J969 286 0.2%
 
R579 223 0.1%
 
J342 178 0.1%
 
J189 167 0.1%
 
J158 108 0.1%
 
N179 86 0.1%
 
Other values (590) 2386 1.4%
 
(Missing) 161680 96.2%
 

Length

Max length4
Mean length3.030145134
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 24 66.7%
 
Decimal_Number 10 27.8%
 
Lowercase_Letter 2 5.6%
 
ValueCountFrequency (%) 
Latin 26 72.2%
 
Common 10 27.8%
 
ValueCountFrequency (%) 
ASCII 36 100.0%
 

DIAGSEC2
Categorical

HIGH CARDINALITY
MISSING
Distinct count112
Unique (%)38.2%
Missing167827
Missing (%)99.8%
Memory size1.3 MiB
J960
56
I10
 
18
J159
 
16
A419
 
12
J969
 
12
Other values (107)
179
ValueCountFrequency (%) 
J960 56 < 0.1%
 
I10 18 < 0.1%
 
J159 16 < 0.1%
 
A419 12 < 0.1%
 
J969 12 < 0.1%
 
J189 11 < 0.1%
 
R579 10 < 0.1%
 
N179 7 < 0.1%
 
J342 5 < 0.1%
 
J348 5 < 0.1%
 
Other values (102) 141 0.1%
 
(Missing) 167827 99.8%
 

Length

Max length4
Mean length3.001498929
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 18 60.0%
 
Decimal_Number 10 33.3%
 
Lowercase_Letter 2 6.7%
 
ValueCountFrequency (%) 
Latin 20 66.7%
 
Common 10 33.3%
 
ValueCountFrequency (%) 
ASCII 30 100.0%
 

DIAGSEC3
Categorical

MISSING
Distinct count33
Unique (%)48.5%
Missing168052
Missing (%)> 99.9%
Memory size1.3 MiB
J960
15
J159
 
6
A419
 
5
N179
 
5
R579
 
4
Other values (28)
33
ValueCountFrequency (%) 
J960 15 < 0.1%
 
J159 6 < 0.1%
 
A419 5 < 0.1%
 
N179 5 < 0.1%
 
R579 4 < 0.1%
 
R578 3 < 0.1%
 
J189 3 < 0.1%
 
R570 2 < 0.1%
 
K746 1 < 0.1%
 
I509 1 < 0.1%
 
Other values (23) 23 < 0.1%
 
(Missing) 168052 > 99.9%
 

Length

Max length4
Mean length3.000392577
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 12 50.0%
 
Decimal_Number 10 41.7%
 
Lowercase_Letter 2 8.3%
 
ValueCountFrequency (%) 
Latin 14 58.3%
 
Common 10 41.7%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

DIAGSEC4
Categorical

MISSING
Distinct count5
Unique (%)71.4%
Missing168113
Missing (%)> 99.9%
Memory size1.3 MiB
R579
3
A419
1
E877
1
G736
1
J960
1
ValueCountFrequency (%) 
R579 3 < 0.1%
 
A419 1 < 0.1%
 
E877 1 < 0.1%
 
G736 1 < 0.1%
 
J960 1 < 0.1%
 
(Missing) 168113 > 99.9%
 

Length

Max length4
Mean length3.000041637
Min length3
ValueCountFrequency (%) 
Decimal_Number 9 56.2%
 
Uppercase_Letter 5 31.2%
 
Lowercase_Letter 2 12.5%
 
ValueCountFrequency (%) 
Common 9 56.2%
 
Latin 7 43.8%
 
ValueCountFrequency (%) 
ASCII 16 100.0%
 

DIAGSEC5
Categorical

MISSING
Distinct count1
Unique (%)100.0%
Missing168119
Missing (%)> 99.9%
Memory size1.3 MiB
B961
1
ValueCountFrequency (%) 
B961 1 < 0.1%
 
(Missing) 168119 > 99.9%
 

Length

Max length4
Mean length3.000005948
Min length3
ValueCountFrequency (%) 
Decimal_Number 3 50.0%
 
Lowercase_Letter 2 33.3%
 
Uppercase_Letter 1 16.7%
 
ValueCountFrequency (%) 
Latin 3 50.0%
 
Common 3 50.0%
 
ValueCountFrequency (%) 
ASCII 6 100.0%
 

DIAGSEC6
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing168120
Missing (%)100.0%
Memory size1.3 MiB

DIAGSEC7
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing168120
Missing (%)100.0%
Memory size1.3 MiB

DIAGSEC8
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing168120
Missing (%)100.0%
Memory size1.3 MiB

DIAGSEC9
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing168120
Missing (%)100.0%
Memory size1.3 MiB

HOSP
Categorical

HIGH CARDINALITY
MISSING
Distinct count173
Unique (%)0.2%
Missing92037
Missing (%)54.7%
Memory size1.3 MiB
HOSPITAL SANTO ANTONIO
23464
HOSPITAL DO SUBURBIO
13958
HOSPITAL MUNICIPAL DE SIMOES FILHO
6110
HOSPITAL GERAL DE ITAPARICA
5757
HOSPITAL GERAL DE CAMACARI
5303
Other values (168)
21491
ValueCountFrequency (%) 
HOSPITAL SANTO ANTONIO 23464 14.0%
 
HOSPITAL DO SUBURBIO 13958 8.3%
 
HOSPITAL MUNICIPAL DE SIMOES FILHO 6110 3.6%
 
HOSPITAL GERAL DE ITAPARICA 5757 3.4%
 
HOSPITAL GERAL DE CAMACARI 5303 3.2%
 
HOSPITAL GERAL MENANDRO DE FARIA 3280 2.0%
 
HOSPITAL ALAYDE COSTA 3164 1.9%
 
HOSPITAL MUNICIPAL DE CANDEIAS 1980 1.2%
 
HOSPITAL MUNICIPAL DE SALVADOR HMS 1767 1.1%
 
HOSPITAL MUNICIPAL DR EURICO GOULART DE FREITAS 1471 0.9%
 
Other values (163) 9829 5.8%
 
(Missing) 92037 54.7%
 

Length

Max length58
Mean length13.24230312
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 25 86.2%
 
Lowercase_Letter 2 6.9%
 
Space_Separator 1 3.4%
 
Decimal_Number 1 3.4%
 
ValueCountFrequency (%) 
Latin 27 93.1%
 
Common 2 6.9%
 
ValueCountFrequency (%) 
ASCII 29 100.0%
 

BAIRRO_RES
Categorical

HIGH CARDINALITY
MISSING
Distinct count571
Unique (%)0.5%
Missing58379
Missing (%)34.7%
Memory size1.3 MiB
São Marcos
 
5691
Periperi
 
3437
Pernambués
 
3362
Centro
 
2842
Bonfim
 
2666
Other values (566)
91743
ValueCountFrequency (%) 
São Marcos 5691 3.4%
 
Periperi 3437 2.0%
 
Pernambués 3362 2.0%
 
Centro 2842 1.7%
 
Bonfim 2666 1.6%
 
Fazenda Grande do Retiro 2536 1.5%
 
Nazaré 2518 1.5%
 
Paripe 2326 1.4%
 
Valéria 2100 1.2%
 
São Cristóvão 1979 1.2%
 
Other values (561) 80284 47.8%
 
(Missing) 58379 34.7%
 

Length

Max length47
Mean length8.117909826
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 36 50.0%
 
Uppercase_Letter 25 34.7%
 
Decimal_Number 5 6.9%
 
Close_Punctuation 1 1.4%
 
Open_Punctuation 1 1.4%
 
Space_Separator 1 1.4%
 
Dash_Punctuation 1 1.4%
 
Other_Letter 1 1.4%
 
Other_Punctuation 1 1.4%
 
ValueCountFrequency (%) 
Latin 62 86.1%
 
Common 10 13.9%
 
ValueCountFrequency (%) 
ASCII 59 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

CEPMUNIC_RESNASCSEXOUTI_MES_TOUTI_INT_TOVAL_SHVAL_SPDT_INTERDIAG_PRINCDIAG_SECUNMUNIC_MOVDIAS_PERMMORTENACIONALINSTRUINSC_PNCNESRACA_CORETNIADIAGSEC1DIAGSEC2DIAGSEC3DIAGSEC4DIAGSEC5DIAGSEC6DIAGSEC7DIAGSEC8DIAGSEC9HOSPBAIRRO_RES
041760000Salvador1926-06-04300529.6825.712009-04-23J440NaNSalvador601000387599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCosta Azul
140220450Salvador1930-06-30100614.6425.712009-04-24J440NaNSalvador601000387599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNEngenho Velho da Federação
241320010Salvador1971-03-05100508.7478.352009-05-12J13NaNSalvador401000398099NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCastelo Branco
3481800002910501990-12-041301584.95259.092009-03-27J930J188Salvador501000406599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
440330310Salvador1964-10-291001600.96304.172009-03-12J960A150Salvador8001000406599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNIAPI
541270200Salvador1958-04-293001180.02203.552009-03-12J960R042Salvador6301000406599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCampinas de Pirajá
640323300Salvador1993-02-12100629.5671.972009-03-16J960A150Salvador4901000406599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNIAPI
740250540Salvador1953-03-251001455.25269.342009-02-19J960A150Salvador7001000406599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCosme de Farias
843805220Candeias1954-09-291001099.07184.202009-02-28J960A150Salvador4801000406599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNCentro
9443800002909801948-04-071603561.55529.082009-01-31J960A150Salvador3811000406599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN

Last rows

CEPMUNIC_RESNASCSEXOUTI_MES_TOUTI_INT_TOVAL_SHVAL_SPDT_INTERDIAG_PRINCDIAG_SECUNMUNIC_MOVDIAS_PERMMORTENACIONALINSTRUINSC_PNCNESRACA_CORETNIADIAGSEC1DIAGSEC2DIAGSEC3DIAGSEC4DIAGSEC5DIAGSEC6DIAGSEC7DIAGSEC8DIAGSEC9HOSPBAIRRO_RES
16811040243875Salvador1938-09-27300720.10111.932008-03-11J152NaNSalvador101000429499NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNEngenho Velho de Brotas
168111445500002929101943-03-07100618.87198.912008-03-22J90NaNSalvador401000387599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
16811240221500Salvador1983-03-12100720.10111.932008-02-26J152NaNSalvador211000429499NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNEngenho Velho da Federação
16811342850000Dias d'Ávila2008-04-09300480.0774.622008-04-09J189NaNSalvador201000417099NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
16811440157510Salvador2008-04-06300480.0774.622008-04-06J189NaNSalvador201000417099NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNChame-Chame
16811542850000Dias d'Ávila2008-04-07300480.0774.622008-04-07J189NaNSalvador301000417099NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
16811641100380Salvador1928-03-30300480.0774.622008-04-26J180NaNSalvador401000387599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNPernambués
168117481800002910501983-05-2710033.3410.882008-03-16J960NaNSalvador101000387599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
16811840243500Salvador1965-10-06100609.4525.452008-01-29J188NaNSalvador2801000387599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNEngenho Velho de Brotas
16811941180000Salvador1934-07-06100649.4874.622008-04-16J170NaNSalvador801000387599NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNSaboeiro